A Mathematical View of Latent Semantic Indexing: Tracing Term Co-occurrences
نویسندگان
چکیده
Current research in Latent Semantic Indexing (LSI) shows improvements in performance for a wide variety of information retrieval systems. We propose the development of a theoretical foundation for understanding the values produced in the reduced form of the term-term matrix. We assert that LSI’s use of higher orders of co-occurrence is a critical component of this study. In this work we present experiments that precisely determine the degree of co-occurrence used in LSI. We empirically demonstrate that LSI uses up to fifth order term co-occurrence. We also prove mathematically that a connectivity path exists for every nonzero element in the truncated term-term matrix computed by LSI. A complete understanding of this term transitivity is key to understanding LSI.
منابع مشابه
A framework for understanding Latent Semantic Indexing (LSI) performance
In this paper we present a theoretical model for understanding the performance of Latent Semantic Indexing (LSI) search and retrieval applications. Many models for understanding LSI have been proposed. Ours is the first to study the values produced by LSI in the term dimension vectors. The framework presented here is based on term co-occurrence data. We show a strong correlation between second ...
متن کاملTerm Representation with Generalized Latent Semantic Analysis
Document indexing and representation of termdocument relations are very important issues for document clustering and retrieval. In this paper, we present Generalized Latent Semantic Analysis as a framework for computing semantically motivated term and document vectors. Our focus on term vectors is motivated by the recent success of co-occurrence based measures of semantic similarity obtained fr...
متن کاملMulti-view learning via probabilistic latent semantic analysis
Multi-view learning arouses vast amount of interest in the past decades with numerous real-world applications in web page analysis, bioinformatics, image processing and so on. Unlike the most previous works following the idea of co-training, in this paper we propose a new generative model for Multi-view Learning via Probabilistic Latent Semantic Analysis, called MVPLSA. In this model, we jointl...
متن کاملMemory-restricted latent semantic analysis to accumulate term-document co-occurrence events
0167-8655/$ see front matter 2012 Elsevier B.V. A http://dx.doi.org/10.1016/j.patrec.2012.05.002 ⇑ Corresponding author. E-mail addresses: [email protected] (S.-H (J.-H. Lee). This paper addresses a novel adaptive problem of obtaining a new type of term-document weight. In our problem, an input is given by a long sequence of co-occurrence events between terms and documents, namely, a stream ...
متن کاملLatent Semantic Kernels for Feature Selection Produced as Part of the Esprit Working Group in Neural and Computational Learning Ii, Neurocolt2 27150
Latent Semantic Indexing is a method for selecting informative subspaces of feature spaces. It was developed for information retrieval to reveal semantic information from document co-occurrences. The paper demonstrates how this method can be implemented implicitly to a kernel deened feature space and hence adapted for application to any kernel based learning algorithm and data. Experiments with...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002